Support http2 upstream phase3#32
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b69c248d7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive overhaul of HTTP/2 connection and stream lifecycle management. Key additions include a "held-fallback" retry mechanism for buffering 5xx responses, improved GOAWAY frame handling to differentiate between processed and unprocessed requests, and a transition to unique_ptr ownership for connections within the H2 table. The implementation utilizes a dual-token pattern (raw pointer combined with an atomic liveness flag) to ensure thread-safe access from asynchronous transport callbacks. One high-severity issue was identified in the DrainPausedBuffer implementation, where a lambda capture lacks a liveness check, creating a potential use-after-free vulnerability if the connection is destroyed before the replay drain listener fires.
HTTP/2 upstream Phase 3 (PR-2): wait-queue + checkout/lifecycle overhaul
Ships the wait-queue and cold-start lifecycle overhaul tracked in
HTTP2_UPSTREAM_PHASE_RECONCILIATION.md§2 (PR-2). Touches partition, lease, H2 connection, and proxy transaction. Held-fallback retry on truncation (PR-1.5) and the per-(host,port,endpoint) ALPN cache remain deferred to a follow-up.Summary
OpenNewH2Connection+OnH2ConnectHandshakeCompletesingle-disposition state machine. Single outbound connect per (host, port); concurrent CheckoutAsync calls dedup onto the in-flight probe viah2_connecting_conns_.count(key).prefer=auto→ adopt the negotiated transport into either H2 (AdoptLease+h2_table_.Insert) or the H1 idle pool (MarkTransferred+AdoptAsH1Connection+ReclassifyH2WaitersToAny+ServiceWaitQueue).prefer=alwaysstrictly fails on non-h2 ALPN.WaiterKind { ANY, H2_STREAM_SLOT }discriminator.ServiceWaitQueue's idle-pop and create-new branches skipH2_STREAM_SLOTentries; admission flows throughDrainH2StreamWaitersForHostafter a probe resolves to h2.OnGoawayReceivedwithactive_streams==0now hands the dying conn topending_destroy_h2_conns_(deferred-destroy stash, reaped fromHandleBytespost-recv flush — avoids UAF in the recv-callback chain) and firesStartH2ReplacementConnect(host, port)to keep queued waiters from timing out.RESULT_GOAWAY_UNPROCESSED = -12(peer demonstrably never processed the stream; zero-delay first retry via CONNECT_FAILURE) andRESULT_GOAWAY_MAYBE_PROCESSED = -13(standard retry). Both breaker-neutral (connection-lifecycle, not health signal). NewIsH2RetryableCode/MapH2CodeToRetryConditionhelpers.UpstreamLeaseis now a tagged variantKind { EMPTY, H1, H2 }. H2 ctor carries(partition_alive, conn_alive)dual tokens; accessors (GetH2Connection,GetH2StreamId,GetH2Stream) gate on both.H2ConnectionTablemigrated fromshared_ptr→unique_ptrownership with explicitExtract(UpstreamH2Connection*)forMoveConnToPendingDestroy.DestroyOnDispatcher6-step ordering (alive flip → callback null → timer cleanup → session teardown → MarkClosing → lease.reset → destroyed flag) +RunDeferredEraseWalktwo-phase stream-erase walker.HostPortKeyuser-defined struct in new header (deliberately not astd::pairalias — specializingstd::hashfor a library type is UB per[namespace.std]).h2_upstream_test.h(S6–S13, A6b, A6c, B12b update, N9o/N9p early-final-headers + shutdown-kill, B17/B18/B19 wire RST/GOAWAY) andupstream_pool_test.h(lease Kind discrimination, HostPortKey hash + equality, extended error codes).File-level scope
include/upstream/host_port_key.h(new)std::hashspecializationinclude/upstream/pool_partition.h,server/pool_partition.ccOpenNewH2Connection,OnH2ConnectHandshakeComplete,AdoptAsH1Connection,ReclassifyH2WaitersToAny,StartH2ReplacementConnect,MoveConnToPendingDestroy,ReapPendingDestroyH2Conns,EnqueueH2StreamSlotWaiter,DrainH2StreamWaitersForHost,FailH2StreamSlotWaiters,WaiterKind,h2_connecting_conns_,pending_destroy_h2_conns_include/upstream/upstream_h2_connection.h,server/upstream_h2_connection.ccpartition_back-pointer +SetPartition,MarkTransferred,transferred_flag,DestroyOnDispatcher6-step,RunDeferredEraseWalk, GOAWAY-idle branch,ClearH2TransportCallbackshelper, dtor safety-netinclude/upstream/upstream_lease.hKind { EMPTY, H1, H2 }, H2 ctor with dual tokens, H2 accessorsinclude/upstream/h2_connection_table.h,server/h2_connection_table.ccunique_ptrmigration,Extract(UpstreamH2Connection*)include/upstream/proxy_transaction.h,server/proxy_transaction.ccRESULT_GOAWAY_UNPROCESSED/RESULT_GOAWAY_MAYBE_PROCESSED,IsH2RetryableCode,MapH2CodeToRetryCondition,h2_conn_+h2_conn_alive_dual-token capture,H2ConnAlive()accessorinclude/upstream/upstream_h2_stream.h,include/upstream/upstream_response_sink.hinclude/config/server_config.h,server/config_loader.ccMakefilehost_port_key.hadded toUPSTREAM_HEADERStest/h2_upstream_test.h,test/upstream_pool_test.hNotable correctness work uncovered during review
The
gateway-code-reviewerpass surfaced two real bugs that the test suite missed; both fixed in this PR:OpenNewH2Connection'sRegisterOutboundCallbacksthrow path leakedoutstanding_conns_+++ the parked transport inconnecting_conns_. The pre-fix catch handler synchronously recursed intoOnH2ConnectHandshakeComplete(CHECKOUT_CONNECT_FAILED), but at that point the H2 shell had not yet been inserted intoh2_connecting_conns_, so the disposition handler short-circuited and skippedExtractFromConnecting. Fix: drop the local shell (so its safety-net dtor doesn't race the synchronous close-callback chain), thenExtractFromConnecting(raw_uc)+DestroyConnection(owned).DestroyConnection'sForceClosefires the close-cb synchronously, the close-cb callsOnH2ConnectHandshakeCompletewith shell=null + uc_raw=null and fails queued waiters;DestroyConnectionthen decrementsoutstanding_conns_and tears down the transport.ClearH2TransportCallbackscould clobber the H1 borrower's callbacks just installed byWirePoolCallbacks. If~UpstreamH2Connection's safety-net path ran afterAdoptAsH1Connection, it would null-assignSetOnMessageCb/SetCloseCb/ etc. on the freshly-adopted H1 idle conn. Fix:MarkTransferred()setstransferred_=trueBEFORE adoption, and bothDestroyOnDispatcherand~UpstreamH2Connectionshort-circuit on the flag.Defense-in-depth additions:
AcquireH2Connectionnow defensively nullsSetConnectCompleteCallback/SetHandshakeCompleteCallbackat the H2 session install path (eliminates a class of "closures from a prior owner survive across a promotion boundary" footguns).Test results
Per-suite:
./test_runner h2_upstream— 128/128./test_runner upstream— 42/42CI coverage
./test_runner upstreamand./test_runner h2_upstreamwere already wired inci.yml(tsan-rest enumeration + macOS subset) andweekly-valgrind.yml. PR-2 extends existing suites; no new suite registration required.Out of scope
RESULT_TRUNCATED_RESPONSEis terminal (502 BadGateway). Closes r85 criteria 22-24, 26, 30, 57, 59, 70.